Reproducible data collection

#| label: setup
#| echo: false
#| eval:true
#| message: false

library(knitr)
library(tidyverse)
library(lubridate)

Content

PFTC Courses

PFTC Courses

Design spreadsheet

Spreadsheet content

  • ID (unique ID for each observation, individual)
  • Date, time, observation number
  • Location: region/site
  • Experimental design: block, plot, replicate, number of observation, treatments
  • Organism: species/population/genet
  • Response
  • Predictors
  • METADATA: recorder/scribe, weather, notes

Design spreadsheet - data validation

Rectangular spreadsheet - good example

  • rectangular.
  • not have empty cells, rows or columns, titles or double headers.

Figure 1: ?(caption)

Rectangular spreadsheet - bad example

Figure 2: ?(caption)

Long or wide format

Figure 3: ?(caption)

Single value per cell

Tidy spreadsheets follow the following rules:

  • each variable should be one specific column,
  • each observation should be one specific row,
  • each cell at the intersection of a row and a column contains a single value.

Figure 4: Wide (A) and long (B) data table.

Consistency

Figure 5: Inconsistency in species names

Meaningful names

Figure 6: Final doc by PhDcomics.com

Style

Figure 7: Different styles for naming objects. Credit: Allison Horst.

Standards

Use global data standards when available.

Figure 8: ?(caption)